14 research outputs found

    Have Your Cake and Eat It? Productive Parallel Programming via Chapel’s High-level Constructs

    Get PDF
    Explicit parallel programming is required to utilize the growing parallelism in computer hardware. However, current mainstream parallel notations, such as OpenMP and MPI, lack in programmability. Chapel tries to tackle this problem by providing high-level constructs. However, the performance implication of such constructs is not clear, and needs to be evaluated. The key contributions of this work are: 1. An evaluation of data parallelism and global-view programming in Chapel through the reduce and transpose benchmarks. 2. Identification of bugs in Chapel runtime code with proposed fixes. 3. A benchmarking framework that aids in conducting systematic and rigorous performance evaluation. Through examples, I show that data parallelism and global-view programming lead to clean and succinct code in Chapel. In the reduce benchmark, I found that data parallelism makes Chapel outperform the baseline. However, in the transpose benchmark, I found that global-view programming causes performance degradation in Chapel due to frequent implicit communication. I argue that this is not an inherent problem with Chapel, and can be solved by compiler optimizations. The results suggest that it is possible to use high-level abstraction in parallel languages to improve the productivity of programmers, while still delivering competitive performance. Furthermore, the benchmarking framework I developed can aid the wider research community in performance evaluations

    Power Consumption of Instruction Encodings on Cortex-M4

    Get PDF
    Energy efficiency is increasingly important with wider use of batterypowered devices. There are many factors involved in the power consumption of instructions, such as encodings. The energy implication of these factors is not clear, and needs to be evaluated. The main contribution of this work is the characterization of power consumption of some instruction encodings of ARM v7-M on Cortex-M4 MCU STM32L476VGT6. I also designed and implemented a self-contained power measurement infrastructure on 32L476GDISCOVERY. I found that the encodings of instructions could affect the energy consumption of the MCU. The results of this work can make application programmers and hardware vendors be aware of the energy characteristics of programs and devices. Future work of a more detailed model could allow us to predict the energy consumption when designing new systems.A report submitted for the course COMP2300 (Advanced Studies Extension

    Concurrent Copying Garbage Collection with Hardware Transactional Memory

    Get PDF
    Many applications, such as video-based or transaction-based ones, are latency-critical. Any additional latency may greatly degrade the user experience, inflicting significant financial loss on the vendor. Recently, an increasing number of these applications are written in managed languages, such as C#, Java, JavaScript, and PHP, for productivity and reliability. Garbage collection (GC) provides automatic memory management to managed languages. However, GC can also induce pauses in the application, greatly affecting the user experience. This thesis explores the challenges of minimizing GC pauses. Concurrent GC reduces pauses by working concurrently with the application (the mutator). Copying GC improves the mutator locality and reduces the heap fragmentation. Concurrent copying GC achieves both, but requires heavyweight synchronization to ensure that the concurrently executing mutator has a consistent view of the heap while the collector changes it. Existing implementations of concurrent copying GC use read barriers or page protections to prevent the mutator from using stale references. Unfortunately, these synchronization mechanisms introduce high overhead to the mutator. My thesis is that, by using hardware transactional memory (HTM), mutators can execute transactionally during concurrent copying, achieving a consistent view of the heap, but with lower overhead than read barriers or page protection. The contributions of this thesis are twofold. (1) I implement and evaluate a novel algorithm of using HTM to reduce the mutator overhead of concurrent copying GC. (2) I conduct a detailed analysis of HTM capacity, filling a significant gap in the literature, and informing the design of our HTM-based algorithm. I then use the insights on HTM capacity to implement several optimizations to improve the algorithm. Using the Intel Transactional Synchronization Extension (TSX) as a case study, I measure the transaction capacity on this popular HTM implementation, and cross-validate the results with the literature and fill a gap in the literature, resolving ostensibly contradictory results. I have also explored different factors that may affect the effective capacity of transactions, which have not yet been reported in the literature (to the best of my knowledge). I implement the algorithm in MMTk, a framework for the design and implementation of GC. The implementation is evaluated on Intel TSX using several test programs. The results suggest that performing concurrent copying GC using HTM is viable. This work deepens the understanding of HTM, its strengths and weaknesses, in the research community. Strategies using this work to fully exploit the capabilities of HTM can be generalized and applied to other applications of HTM. Finally, this work enables the design and implementation of concurrent copying GC with lower mutator overhead with similar hardware support

    Distilling the Real Cost of Production Garbage Collectors

    Get PDF
    Abridged abstract: despite the long history of garbage collection (GC) and its prevalence in modern programming languages, there is surprisingly little clarity about its true cost. Without understanding their cost, crucial tradeoffs made by garbage collectors (GCs) go unnoticed. This can lead to misguided design constraints and evaluation criteria used by GC researchers and users, hindering the development of high-performance, low-cost GCs. In this paper, we develop a methodology that allows us to empirically estimate the cost of GC for any given set of metrics. By distilling out the explicitly identifiable GC cost, we estimate the intrinsic application execution cost using different GCs. The minimum distilled cost forms a baseline. Subtracting this baseline from the total execution costs, we can then place an empirical lower bound on the absolute costs of different GCs. Using this methodology, we study five production GCs in OpenJDK 17, a high-performance Java runtime. We measure the cost of these collectors, and expose their respective key performance tradeoffs. We find that with a modestly sized heap, production GCs incur substantial overheads across a diverse suite of modern benchmarks, spending at least 7-82% more wall-clock time and 6-92% more CPU cycles relative to the baseline cost. We show that these costs can be masked by concurrency and generous provisioning of memory/compute. In addition, we find that newer low-pause GCs are significantly more expensive than older GCs, and, surprisingly, sometimes deliver worse application latency than stop-the-world GCs. Our findings reaffirm that GC is by no means a solved problem and that a low-cost, low-latency GC remains elusive. We recommend adopting the distillation methodology together with a wider range of cost metrics for future GC evaluations.Comment: Camera-ready versio

    Optogenetic Control of Non-Apoptotic Cell Death

    Get PDF
    Herein, a set of optogenetic tools (designated LiPOP) that enable photoswitchable necroptosis and pyroptosis in live cells with varying kinetics, is introduced. The LiPOP tools allow reconstruction of the key molecular steps involved in these two non-apoptotic cell death pathways by harnessing the power of light. Further, the use of LiPOPs coupled with upconversion nanoparticles or bioluminescence is demonstrated to achieve wireless optogenetic or chemo-optogenetic killing of cancer cells in multiple mouse tumor models. LiPOPs can trigger necroptotic and pyroptotic cell death in cultured prokaryotic or eukaryotic cells and in living animals, and set the stage for studying the role of non-apoptotic cell death pathways during microbial infection and anti-tumor immunity

    Verification of Concurrent Data Structures with TLA

    Get PDF
    Concurrent systems have critical applications such as aviation. Due to their inherent complexity, mechanised verification is well suited for reasoning about the safety and liveness properties of such systems. Temporal logics, such as TLA and LTL, have been used to verify distributed systems and protocols. However, it is not clear whether these logics are good fits for modelling and verifying concurrent data structures. This work describes how Temporal Logic of Actions (TLA) can be adapted to handle concurrent data structures and weak memory models. It also shows how the TLA toolchain, especially the model checker, can aid in cleanly applying the logical machinery to concrete programs. I used litmus tests to validate my encoding of memory models against prior work. These models enabled me to formalize various concurrent data structures, including the Chase-Lev queue. Then, I am able to check the behaviours of these data structures against abstract specification of their operations. In particular, my modelling can successfully find bugs in a faulty implementation of the Chase-Lev queue. The results suggest that TLA is appropriate for modelling concurrent data structures. The formal models I designed and the related modelling techniques can be used by the wider research community in their verification work

    Activity Recognition in Videos with Segmented Streams

    No full text
    We investigate a Convolutional Neural Networks (CNN) architecture for activity recognition in short video clips. Applications are ubiquitous, ranging from guiding unmanned vehicles to captioning video clips. While the employment of CNN architectures on large image datasets (such as ImageNet) has been successfully demonstrated in many prior works, there is still no clear answer as to how one can use adapt CNNs to video data. Several different architectures have been explored such as C3D and two-stream networks. However, they all use RGB frames of the video clips as is. In this work, we introduce segmented streams, where each stream consists of the original RGB frames segmented by motion types. We find that after training on the UCF101 dataset, we are able to improve over the original two-stream work by fusing our segmented streams

    Occupational Health Risk Assessment in the Electronics Industry in China Based on the Occupational Classification Method and EPA Model

    No full text
    The awareness of occupational health risk management in the electronics industry is weak in China, and many Chinese occupational health management regulations have not been effectively implemented. China’s current occupational hazards classification method and the Environmental Protection Agency (EPA) inhalation risk assessment model recognized internationally were used to perform health risk assessments for a chip manufacturing company in the electronics industry in order to determine the existing problems and put forward the optimization proposals of the occupational hazards classification method in China. The results showed that the detected concentrations of toxic and harmful chemicals in all testing points did not exceed the occupational health exposure limits in China. According to the EPA inhalation risk assessment model, the highest values of non-carcinogenic risks of ammonia, chlorine, fluoride, sulfuric acid, hydrogen chloride, ethylene glycol, phosphine, boron trifluoride, isopropanol, benzene, and xylene were 5.10, 67.12, 1.71, 45.98, 1.83, 1.43, 160.35, 46.56, 2.52, 5.55, and 5.37, respectively, which means workers in electronic chip manufacturing companies exposed to these chemicals have higher occupational health risks. However, on the basis of the occupational hazards classification method, the occupational health risks of exposure to the toxic and hazardous chemicals are relatively harmless operations. The evaluation results of the EPA inhalation risk assessment model are generally higher than those of the occupational hazards classification method. It’s recommended to refine the value of occupational exposure limit B, taking more characteristics of the hazard factors into account and fuzzifying the parameters to optimize the occupational hazards classification method. At the same time, it is suggested that the electronic chip manufacturing company should conduct anti-virus risk management covering in three aspects: increasing the awareness of occupational hazards, enhancing system ventilation, and improving personal health management measures

    V2O5 Nanospheres with Mixed Vanadium Valences as High Electrochemically Active Aqueous Zinc-Ion Battery Cathode

    No full text
    Abstract A V4+-V2O5 cathode with mixed vanadium valences was prepared via a novel synthetic method using VOOH as the precursor, and its zinc-ion storage performance was evaluated. The products are hollow spheres consisting of nanoflakes. The V4+-V2O5 cathode exhibits a prominent cycling performance, with a specific capacity of 140 mAh g−1 after 1000 cycles at 10 A g−1, and an excellent rate capability. The good electrochemical performance is attributed to the presence of V4+, which leads to higher electrochemical activity, lower polarization, faster ion diffusion, and higher electrical conductivity than V2O5 without V4+. This engineering strategy of valence state manipulation may pave the way for designing high-performance cathodes for elucidating advanced battery chemistry
    corecore